Schema-Less, Semantics-Based Change Detection for XML Documents
نویسندگان
چکیده
Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.
منابع مشابه
XS-Diff: XML schema change detection algorithm
Detecting changes in XML data has emerged as an important research issue in the last decade, but the majority of change detection algorithms focus on XML documents rather than on their schemas because documents that contain data are deemed more significant than the schema itself. However, the XML schema change detection tool is essential, especially in situations where we need to maintain relat...
متن کاملF2/XML: Managing XML Document Schema Evolution
XML has become an emerging standard for data representation and data exchange on the Web. Although XML data is self-describing, most application domains tend to use document schemas. Over a period of time, these schemas need to be modified to reflect a change in the real-world, a change in the user’s requirements, mistakes or missing information in the initial design. Most of the current XML ma...
متن کاملDTD-Diff: A Change Detection Algorithm for DTDs
The DTD of a set of XML documents may change due to many reasons such as changes to the real world events, changes to the user’s requirements, and mistakes in the initial design. In this paper, we present a novel algorithm called DTD-Diff to detect the changes to DTDs that defines the structure of a set of XML documents. Such change detection tool can be useful in several ways such as maintenan...
متن کاملSchema versioning in tXSchema-based multitemporal XML repositories
τXSchema [7] is a framework (a language and a suite of tools) for the creation and validation of timevarying XML documents. A τXSchema schema is composed of a conventional XML Schema document annotated with physical and logical annotations. All components of a τXSchema schema (i.e., conventional schema, logical annotations, and physical annotations) can change over time to reflect changes in us...
متن کاملUUXML: A Type-Preserving XML Schema-Haskell Data Binding
An XML data binding is a translation of XML documents into values of some programming language. This paper discusses a typepreserving XML–Haskell data binding that handles documents typed by the W3C XML Schema standard. Our translation is based on a formal semantics of Schema, and has been proved sound with respect to the semantics. We also show a program in Generic Haskell that constructs pars...
متن کامل